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Dataprooessing system. 

The inveotion relates to a dataprocessiiig syjstem comprising a fiist and secotwl processing 
module y/tdch are npiutually coupled by a syQChrotazing bufiTer. 

Two developments undeimine the role of globally clocked VLSI circuits, lii the first place, 
the trend towards system-on-chip designs leads to chips containing several processing modules 
widch all have different cycle times. Secondly, in future technologies it will become increasingly 
difficult to distribtrte high-speed low-skew clock signaU. Therefore, future chips will contain 
several locally clocked processing modules, which comraumcafie through dedicated glue logic in 
the fomi of a synchronizing buffer. These heterogeneous systems axe called GALS (Globally 
Asynchronous, Locally Synchronous) systems- 
Such a synchronizing buffer is known from J. N. Seizovio. Pipeline synchronization. In 
Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, 
pages 87-96, Nov. 1994. In the dataprooessing shown flier ein the pipeliae comprises a 
syndironizer between each two pipeline elements. A synchronizer is disclosed having a relatively 
con^lioated structure, which comprises an RS flip-flop having a set Input controlled by a first 
nmtually exclusive (ME) gate, and a reset input controlled by a second M&.gate. Each of the ME- 
gates is synchronized with a clooksigna] and receives a respective reijuest signal fiom a logic 
ciccuit in response to a request signal fix)m a preceding pipeline element, a request signal from a 
succeeding dement and an internally feedback signal. 

It is a purpose of the inventiouto provide an improved dataprooessing system which allows 
synchronizatiou by relatively simple synchranizatioa circuits. In accordance ttierewith the 
datsq)iocessJng of the mvention is defined by claim 1 » 

In the dataprocessine system according to the Invention the pipeline elements mutually 
control the dataflow via a two phase handshake signal hrtecoally however a four phase handfihake 
signal is used to control ttie latches that latch the dataflow. As will be shown in the sequel, the 
pipeline elements* in tlie dataprocessing system according to the hivention can be synchronized by 
veiy simple synchronization elements. In the following reference will be made to the jfollowing 
Annex; "Bridging Clock Domains by synchronizing the mice in the mouselr^^ Joep Kessels, Ad 
PeeteiB and Suk-Jin Kim. 

It is noted that the pipeline elements are known as such from WO 02/35346. Howev^ it is 
not disclosed heiein to combine 1h© pipeline elecnents with synchronizers. On the contrary it is 
mggesied that it is essential that tite pipeline is a^chronous to enable to hiterfece with 
environmenfe operating at different tat^. 

In an embodiment tlie synchronization element synchronizes the control signal with a phase 
of flie clock signal. This can be realised with a two-phase wait component which can be 
constructed from a four-phase wait component as shown in Figure 5 of the Annex. The two-phaae 
wait component comprises a comparison client, such as an XOR-gate, a four-phase wait 
component and a latch, wherein the latch recdves a request signal in order to provide tiie latched 
requ^ rfgnal as an acknowledge signal. The request signal and the acknowledge signal are 
conapared by the comparison element which provides an input signal for the four-phase wait 
comptaieiAj whldi synchronizes this signal with the clock signal to provide the control signal for 
thelatch. 

hi a preferred embodiment the synchronization element synchronizes the control signal with 
atranflifimi (edge) of fte clock signal. A four phase edge synchronizer can be obtained from a 
tombiaaifcion of two four-phase wait components as is shown hi Figure 8a of the Annex. Therein the 
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first four-phase wdt componfiait synchronizes the input control signal with the inverted clock <;fo«,i 
and provides phase ^yni*««feed signal to ihe second four-pJ^STwX^^S iS^f^ 
synclmmi^s flie so Obtained sigtt^^^^^ 

Fjgow 7.a) of me Annex by one np-edge cowponent in between the two^pXe ^em^ a« 
edge two-phase (e4ge) tpnsition synchroni^tion element is birilt by^SSe^S" ^ 
compoi«ntsshovmmPigiffe8.aoflheAimex:byfow^^^^ 
gKnchwmzmg <m edges instead of phases ^^^^^ 

Depending on tvhedier the pipeline has to svntbwini'w. rJthTJ^^ tmaer. 
®5™™™«*™"*«'«ilmgI>K>oes5tog modulo cm 

synchroS'oretS^^^SSr4%^^-^^^^ 

wherein the pipeline comprise at least a fflSSal^ ^ ^ ^^^<^^i^S system, 
synchronization element wovidS Se ^^^i 1^"^' ^ ^^"^^^ 

synchronizing the ktcbed&s^contoSSS^ 
providedby the secondpKJcessing^S^^^,^^^^^ 

of the for phase ?y„chronling and e^^yn^^^^TespeXei?*"' ^^^^ ^ and 10 

Synchiomzmg with a reading processing module onSie Sf Si«e1fn« w ^i, 
miting processing modwle on the othCT aid of ^e™CKr^ i„ ®™<'*«»I»Pelmeandwii!ia 
of a write sectionfa read section aS^^Ste™S?^i^^^ "^^^P^^^ing the pipeline 

paragr^h 4 of tntemaediate section as is described in more detail in 

invenao^tiStrcSJS^;:^^ accordingto the 
control signal (Wreq) wifefhe laS flS^SS^r.T.r*'' the first 

providinglflw lafeh jintrol signaUdtfaf^p^^^^^^ f^^^V 
control signal (Rxeq) and ^^oM^Sl7mS ^^Ti ^- Z'^'^' ^ 
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1 1 therewitli is Slipeifluous. Instead of using a edge syrichronization in Pigure 1 1 the coBtrol signal 
for the latch may be controlled by phase syncbrojaizaliorL 

Figure 1 schematically shows a GALS system comprising a first and second mutually 
asynchronous processor modules lO, 20 which are coupled to each other by a synchronizing buffer 
30. The buffer with an input side W (for Write) coupled to the fiist processor module 10 and an 
output side R (for Head) coipled ta ttie second processor module 20. Each side has a clocked 
interface with an independent clock (Wcllc and RcBc). AH other input/output signals are valid at the 
rising edge of the corresponding clock signal. The clocked protocol at each buffer interface is not a 
handshake protocol, but a more appropriate symmetric rendez-vous protocol in which both the 
buffer and the environment indicate their readiness to perform a transfer operation by making a 
dedicated signal high. 
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Figure 1. Interface of synohronlzing buffer 
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Bridging Clock Domains 
by synchronizing the mice in the monsetr ap 

Joep Kessels/ Ad Peeters,* and Suk^in Kim^ 



Abstract 

We pm&it Oie design ofafirst^in ftrsHna buffbr ffiat 
cmi be used to bridge clock domains in GALS (Otcbalfy 
Asynchronous, Locatfy SynOimnous) systms. Both the in^ 
put (md output Side of ^ buffet have an fndependemfy 
clocked inteiface. The design these kind of bikers itu 
herentfy p^sm die problems of metostability and synchro- 
rdzation failure. In the proposed design die probabOUy t>f 
s^fichronizfftionfmlure eon be decreased ^onemia^y by 
increasing the bi^fer size. Consequent, at system leva! one 
can trade ^between s^ty and 2o\u latency. The design is 
based on Mo well-known ideasz pipeline synchronization 
and mousetrap bt^Sfers. We first combine both ideat and 
then in two steps itnprove the design, 
Keywordss GALS systems, data synclmnis^er, pipeline s^ 
chronirscaien^ mausetrap buffen bridging clod: domedns. 



1, hxtTQOndim. 

Two nriirfaTiD fipts undermma the role of globfdly 
clocOcedVLSIcinsiiics. In flie first place, fiie trena (owatdp 
syst^mron-chip dfi$igng leads to Chip9 cofitaining several IP 
modttles-wMrfi oil havo diflteent cycle times. Secondly, 
In fotme tedmologies it will becoma i&creasizigly dl£QcuIt 
to distribute nigti-^£>eed low-^cm clock signals. Tltejre- 
fbre, fttteo clilp9 iRill qojrt^ several locally ctociked sub* 
modules, whirsh commuzzlcate througli dedicated glue logio. 
Ifemr I mn^exixxsoA systems aro called GALS (Globally 
Aayncljronous, Locally Synohrouoxi^) systems [IJ, Tw> 
iads of GALS ^^ystecos can be distinguished depei^g on 
the wsy dae ^yndirououa siibxnodules ci^onmuxnioase. 

• la a elocl: synchronization system, the submodules 
have so-caned pausible olocks, which are ring opcll- 

flEwan©^ %U5SifiBt« of Sdence and Hgchnolt^i Kwaas-iu, 500-712 



lators that can be halted. Safb communication ia ob- 
tained by syachmnlrfna: the eloolES £9. 10» 14, S, 6, 3J, 

■ In a data synchronization systenip the submodules 
have fiee-runuin^ clocks and the data being coroiw- 
nicated from one clock domain to the other is ayn-^ 
ohronized. A ^mple solution is the weil-knovm iwo- 
rogiSBBT or donblc-Iatcdi qncteonizot [9, 5]. More 
elaboocato synchroni^ouag Sfliheme3 are based on flrst-in 
first-ottt buflfers, Tb& solution presented in f2J uses 
(distributed) pointer-based buffers offering both low 
lateni^ and low power dissipation. However* ainco 
reading and writing is done via a bus connecting an . 
CQlla, it will be difficult to obtain a high throu^ipnL 
Tbi^ holds hi paitic^ar when large distances have to 
be bridged, Jfafll] a wlutionisprasentedba^edorn rip- 
ple buffers. Gompared to polntM^based bnffets, ripple 
buffershanre longerlatencios and dissipate mows power . 
Howeven th^ aUow distributed placements wi& short 
point-toipomt mmrconnects. 'HieFeC^fe. ^ple bufipn» 
CAn offer higher throughputs. 

Data synchronization systems inherently have to deal 
with metastabie stales, which have to be resolved within 
a given time period. When Chooaing this period one has 
to trade oif between safety and low lateiwty. Qock «yn- 
chroniaaiion systems are safer hi that thfuy wait until the 
metastablo atates have been resolved. Moreover, a large 
subset of dock synchronization systems can be designed 
without day arbitration C9. 10, 1, 6]. Despite Ais technical 
advanfeige of clock synchT0ni2:atiott systems, syuchronoua 
designers te^td to prefer tiie more famiUar data $yuchroni2a-. 
tion ^sterna. 

We present a pipelme synchroniser based on the mouse- 
trap buffer 1121 -The paper is Coganized as foflows. Section 
2 gives the jipecificatlon of the ^yndironizing buffer and in 
section 3 we mtroduce and analyse the design of Oie mouse- 
trap buffer. In section 4 we apply pfpaHne synohronissation 
in the mou$etrap buffer. We first combine both ideas and 
dien in two steps improve the design. In section 5 we sum- 
marize the differences between die thme designs. AH fbe 
designs thai we discuss have been hi^Iem^ted with a 16- 



24.JflN-2003 16:24 



PHILIPS CIP NL +31 40 -2743489 




NO. 748 P. 12X22 
012 24.01.2003 16; 



24 




Figure 1. (nierfaoe of synchronizing buffer 



bit mdc data path in a 0.18 /im CMOS technology, lliey 
have l^eott simulated o^g bddlc aimotolion with esKmaied 
wire loads. 

2. Spedflcation of the syndmxniziiig buffer 

Fig. 1 shows a buffer with an input side W (for Writ©) 
and an output side R (for Read). Each si^ has a olocked 
lAtet^e with an indep&adent clock (Wclk and Rclk). All 
oihfir Input/output signal ^o valid at the rising edge of the 
cosTdSpoxiding clock dgnal, tho docted protocol at each 
buffer intCT^^ i& not a handshake protocol, htit a more ap- 
piropriate symmetric nPW?a2-Vi>tt5 protocol in which both the 
huCTcr and die environmotit indicate didr readiness to per- 
fonn a transfer operation by making a dedicate signal higfu 
The butrer f^ifflial is called niy (for ready) and the signal 
from die environment i$ called enb (for enable), A data 
fcanafer only occurs if both the teady and enable signal are 
hij^ataxisiiiscloslcedgd* Qnfheii^utsideT|V0nnustb9 
valid when Wei^ is hi£^ and on flio output side Hdat is vaGd 
when Ritfy i& high. Note that the specification of the inter- 
face allows two buffers to be connected diroctiy» provided 
die two inti^coim^Kited incei&ces share die same cloi^ 

3. Hie mousetrap 

The design of die buffer is based on the ttrdl-known 
mousetrap buffer £121, Which is a ripple buffer using 2- 
phase shigl&^rail handshake signalling for the coramwuCA- 
tion betv^ecn two neighbourm^ ceOs. The mousetrap bufif^ 
has several properties that make it veaiy attractive for clock 
bridgmg in GALS systems. 

• In many GALS systems clock brid^g also implies 
bridging distances, in which case die tran^i^ii^lon de» 
lays are huportant. For dds reasou a two-phase proto- 
col is much more attractive than a four-phase protocol. 

• The mousetrap bufi[ercdl has a ^ort cycle time> which 
means diat it allows high throughputs Cfiast do^) in 
the syncliEioniidng buffers. Momovct^ for a given dodc 
frequency* a shorter cyde time impliea a smalls prob- 
ahdiiy of^cbrcaiijgatlanfdlure. 



• An etx^^monsefrdpbofferoffl^ a minimum laten^r 
of only One latch delay per celL 

• All cell3 oan be filled> which meanfi that die buffer car 
pacity is equal to tiie number of cells. 

• Snce die design does net contain any special asyn- 
chronous elements (such as C-eiements)» it is more 
c^ily understood and accepted by conventional syn^ 
chronoua designeis, which are often the desfgneis ^ 
plyhKg the OALS interfacing ctrcuitty. 

3.1* One cell 

Fig. 2(a) shows the design of a mousetrap cell (MT) and 
^^g- ^(b) gives some invariants ttiat hold when die ceil 13 in a 
quiescent state. First wa note that (ieadrBqufist)sIsnalien?^ 
is equal to (write acknowledge) signal Wutk (invariant 1). 
MoTBOvar, signal empty is high if and only if the read hand- 
shaka signals are equal (invariant 2). Signal empty controls 
a register consistmg of a control bit and a set of data bits. If 
signal empty is high, all latohes hi the register are transpar- 
ent> which means thdt tho ouQmts of the latches axe eqval to 
the inputs (invariant 3). From (hose invariants it ftdlows diat 
if sigpoai en^iy is high (quiesc»it SCata fhra means waiting 
for an ii^t), an fbur handsh^ signals aie eqnaL 

Fig. 2(c) describes the behaviour of the cell for which we 
use the convendon iatroduoed in [7]. All cells start by ini- 
ualizing die control latch with outgoing signal Rf^q to die 
same value, say Mse. which implies that imtiaHy all hand- 
sha^Cfi signals are equal and« hence, all cells are empty. Snb- 
sequantly the call executes an endless loop (*^ JB' means 
sequential i»cecution of A and B and 'A** means hifinite 
mention of A), la. each step of tiie loop die cell iirst waits 
until diexeadhaQd$hake^gnalsareeqnalC[(7]' means wait 
imt^ C holds). When this happens, signal empty becomes 
high and £hft Jamhes in die register become transparent. The 
cell now w^ts until signal Wreq diffears from signal Rreq 
/fiem invariant (1) it followis Wrsq and Wa^^ then dii3fer)» 
which means that its write neighbour is full (offering data). 



nals CA \\ B' means concmxeiit escecntimi of A and J^X 
Consequendyf die write handshake signals become equal 
(write ndghbour becomes empty), and signal Rreq is in- 
verted (cell becomes Adl)- Signal empty then goes low mak- 
ing the latches opaqueand the cell executes a next loop step. 
Kote that passing a data item (full bucket) is done via theio- 
quest signals^ whereas the acknowledge signals am used to 
tenxm an en^iy bucket. 

Let us consider the feed-back loop t^cuit consisting of 
!ha equal gate and die control loccb, which has two mpuE 
signals Wreq and Rack and two output signals Wack and 
empty. If we ignore ou^ut signal efr^tyy the circuit behaves 
as follows 
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Fltfimt 2. nw-runi4ng mouselrap cell (MT) 

([(iEreg = Rack) * ( Wreq ^ Ereq)]; Hreq := Wf^)*^ 

iliWr^ # Hack) ^ {Wreq ^ Rreq)]; Rreq Wreq)\ 

TlieconaTiiOftdiat tiie two request signals niustdifforta gu* 
perfluou$, since it only piwents idle opecatioiis. nerefoie 
iSie Bt^iTvfonr can be fewntten as 

which i$ &e behaviour of a symmetric C-dftment with m 
InvertBT at toe Rack input. In tbis tespect ±e design corre- 
sponds with fee micropipelin© design presented in [13], 

The ciEciiii oiteta, however more ikinctionality. Add!-* 
tkaial oatpoL OgnaL empty can he nsed to control tlie data 
latohes in a cleiver way: it givoa automatic delay maxohing 
(pTOfUied ihL esahie signals &fc skew-free) duri ng each 



two-pha^ e handshake signal e/7?p/y makes a complete latch 
control cycle &om tronapaientto opaque and back to (rans- 
pacentagam. Therefoineamoufietraphui&risonJyianif aii 
cdlsarefUU. 

The Oelement in the mousotr^ has one disadvantage- 
it contains an extended isochronic fork, sfaice ihe output 
changes before the latches have become opaque (state is 
somed). Signal Wack indicacea that the write data win be 
latched after a certain time (X^OK <!clay pins die hdld time 
of a latcb). 'Eierefoie, signal Wack cOTttes with a timing 
constMlnt; signal Wreq should bo stable untfl the laioh is 
closed. If die write jicdghbour is also a mousetrap cell, this 
lequffementis antomaticaUy fhlfilled (we come back to the 
dday asymmetries in section 3^), Note that transmission 
delays between the celte mp to make ^ design more ro- 
bust 

3.2. Two communicating ceMa 

Hg. 3 shows two conmmnicating mouaenrap cells- a 
wnte and a lead cell. We define the cycle time of the buffer 
as the mmunum time it takes for two cells to communi- 
cate one symbol from the write cell to tiie read ceU. From 

fa It^^l^''^^'-^^ " in the wrim c6tt and cS 
jyCL^^intibereado^^ fiicmr«dmologytliiiiotarS 

fA^k^^ ^^"""^ shnulatioix results 

When the two-place buifer nms at fall speed. 3(K\ 

shows the internal communication?. The write ceffindi^ 
^tes ±Bt the data signals ate validby changhig signal rZ 
(thereby making the handshake signals ikSj-^o lead 
ceM indicates that the data signals will be sfmed by chaug. 
mgsignaMfi^ 

again), ^ 6)^e time only iuClndes ihe delays heeded 
topaas on fte handshake signals (rising edge delay of the 
XNOR^am and latch delay). The events needed to close 
the latch are performed concurrently with the cycle activitv 
oftiiereadccll. TCIieiigure shows (hat the latches In the read 
ecu are gt 0 transparent (signal <:mpty Is high) when the cell 
has signalled that it has stored the dam (period e^. Hiis is 

due lo the extendfidiaochwnicforkin me design andperiod 
^co«espondstoanXNORdele^. Note that signalS^ 
goes tagh at about the same time when signal Req mak^ a 
transition, From this fnct ii follows that the buffer runs at 
fm speed (full bucket from the write cell and emp^r bucfc^ 
in the read cell arrive at about the same tune). 

Fig. 3(c) shows the external communications. Note the 
phase shift between the input and output operalions, wSiicth 
will be dtecnasedin the next section. *y^«^»a 

stands tbr tUe 4dAy of gate A 
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Figure 3a IWo communrcating mousetrap 
cells 



nals differ, wMch for a cwo-phase protocol Implies dial: 
two consecutive communicatioiis tsU^ different times. 
Note that in Fig. 3(b) a n^g edge lit signal Re^ ot^ 
CUTS befote signal mtpty goes bx^, wliexeaji for falling 
edge^ in signal Req it is the other way aroinid. 

• Use moBt iniportant a^yimnetiy. howsvei^ is the 
diffmnce between tho times it talces to pass a 
fttll or an empiy bucket Passing a full budk&c 
(JtLATCH)) is fiaater than passing an empty bucket 
<tf(X3SIOIl)-Hr(LATCaH0). ItoerefOTB 4e data valid pe- 
riod i$ shorter ti^ th& stored period. It also leads do a 
phase shift in the external handshake? of this two-place 
huff^ of whic^h Wreq and Wade are the input hand- 
signals and JRreq and Rack are the output hand- 
shake ^gnalfl (see Kg. 3(c))- In a perfectly syromtst- 
lio buffer the input and output hand^ha^Eces would oe- 
our GKactly at the same time (without any phase shift)* 
wbicli means that su6h a two-placG buffer nmtting at 
foil speed would always coutam CfOA dam item. How- 
ever, a mouseir£^ buffer with 2 ft iVT cells nuning at fiill 
speed contaiu$ less tban iV data items. The asymmetry 
is n<!^ relevant for the maximum throughput of a free- 
running bufiTer^ it only leads to a phase shift between 
the input and ouipui operations. We will see, however, 
that in a synchmoniaing buffer (bufiTer synchronized by 
a clock signal) ti)e asymmetiy has consequences for 
the maximum thioughpar. 

Due to the asymmetry a free-running buffer contain- 
higN dataiiemsandcdosedinaloop^wiUuotrunat 
f iiU speed. A ^ioSl&e kind of asymmehy has also been 
Observed in [4L 



4. npeline ssnotcbronlzation 



S«3. About as3nximetvies in delay 

Several a^ymnu^e$ filay a role with resx>ect to timing. 

• The inputs of a standani cell typically liave different 
delays. lu our design this holds for (tua XNOR-gate and 
the latch (delay ibom enable to data-out diffi^s from 
the delay from data-hx to data-out). The asymmetry 
fbr file X^TOR'-gate can be a$ed to reduce the thnmg 
requtr^nents that follow from fhe extended isodironic 
fork by connecting tbe tequcBt signal to the input 
and Ehe acknowledge signal to the slower one, 

• The delays for i:ismg and falliug e^ges differ fbr all 
con^onents* Theacefore if &e bsffernms at full speed, 
the hl^ and the low petiods of the handshafoe sig- 



A pipelinfr-synchroTuziation buffer cornets of ihiee scc- 
tiong: a write section, an intcnnediate section and a read 
section. The wiite section synduoni^s the input operations 
iviththe write dock, while the read secttOD ^nehronizesthe 
ou^utopedTofionsvfriththfiieadGloek. The intermediate sec- 
tion is an asynchronoUt$ buffer which serves to decouple die 
two synchroiUzing sections. The design of ail diree sections 
is based on ripple buffers. The tran3formatiou feom a rip- 
ple buffer into a synchronizing buffer is presented in [11], 
This tKmsfotmatiou is based on inserting in between two 
neigbouring cells a component that synchronizes the hand* 
Shakes with a clock. 

Wo define the maximum clotkfi&quenvy of a synchronis- 
ing section as tho maximum clock fnequeucy ac which that 
section can transfbr one data item every clock cyolo. 
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SfillS.. ^ Four^phase waif component 
(WAIT4} 
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<b} Tbtsa bchftvtor 




(a)Dasfgii 



Co) Btnemai baEaviour 

Figw& a. TV»D-phase waft component (WAm) 
4.1. Syndixronizafeion based on clocb pho&es 

m HI] are so-called WATT^omponents which synchro- 
mze handshakes vdth a caock phase. The basic WAIT- 
con^onenti$ a fourphase wait component (called WAffr4) 
whicai del^3 flie cott^letion of a four-phase handshake un-' 
til an additional input signal Clk is high. Since signal Olh 
con go low when fee handshake fftarte, a confllot can occur 
implying that an arbiter Is noededln die design of the com- 
ponent. Kg. 4 atoxra the de$ign of die WAm-component 
wsed on a basic ajbiter (also called mutex). As long as 
Olh is low, &Q arbitesc grants the (viilnal) right to proceed 
to ^upper request In^lying Aat handshakes m paused 
until pik is high. 

However, since the mousetrap-buflbr uses two-phase 
handsbalBo aignalling, we need a two-phase WAIT- 
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Figure e. Write ^ 

wftfi two mausextrap ceil« 



phase^ynehronizfng bufitor 



cw^o^t (called WAm). Rg. 5 shows the dedga such 
a WAMa-oomponeDt based on a WArr4-«waio3^S 
two signals R,q and Adc difi^^^ 

.^^"^ 4T02^in«ltas ihtt addition^ cAcqaty 
aeeded to convert a WArr4 toro a WAJT^coj^W 
Smce IMS drcoit is very dmllar » die cSJSlS^ 

* to dttfewnt from die one tS^ 

senttdm till. wMoh OSes two WAir4-CQmponenls. 

wltteb are boffisrs that synchronize tawdfifaalass with {he 
Phases ot a dock. For bodi die write and ito t^ZS, 

J®" design of the write fiyiWteonlzlns 

buffer. wWch synchronl«« Ibe nansfer of ^pty placed 
Since emptir places are Innvifisaed by makiM the actoiowl- 

ma is qrmshnmfeed with die write clook. Ws. 6(b)Xwa a 
toag smulafaon when die buflte runs at fid speed. We 
«*• diat full Inickete (ttansiiions in signal Wn0 solve be- 
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Pigui^ 7. Read phascs-syncfirontefng buffer 
with two mouse-trap cells 



empty). 

Fig. 7(a) shows ±q design of tiie read s^nftchronizing 
T »iffe r. which syiicj)TQii32;es tiie transfer of daU items. Since 
AAtti ifesns are imsfbired by ohaoging the roqacst signd, 
0u$ $igflel i$ synchropiz^ with the dock. Pig. 70?) shows 
th^t in this buffer* empty budkets (rising edges hi signal 
enipify) arriYo before Ihe correspondtog fWl bucksets (tran- 
sitions in signal Wreq). 

Both bttif ers. when running flreely (wait comfionfints al- 
ways ^uibled), oSe&r A maxiTTinTn fhroughput of about 570 
Msymbola^eo. We now ccsm& back to the asyimnedy in 
time it takes to pass afull or an empty budcat, Asyncbnnii;^- 
ingbufferrunnitigaifull speed transfers oAa data item every 
dock cycle, which implies that Ihe input and ouiput oper- 
ation3 of a two-place buffer occur simiiltaneously synchro- 
nized by th& elocik. Therefore tlu^ phase diift that allowed 
each cell in a f^nmning buffer to c^icrato at &11 sx^eed. 
does iiot exist In a sy))chron!zmg buffer. Conaequently, in 
a synchronizing buffer the asymmetry leads to a decrease 
in the maximum throughpttt: the larger the asy^iunetry^ the 
larger the performance los$. 

We saw that m the fi»&-numxng loiaSes the aclcnowlcdge 
path Is ^(XNOR) slower tiian ffaa request path* In the de- 
sign of write synchroaizing buffer we have inserted a 
WATT^oinponentinthe ackaowlcdgepath leading to ade- 
lay di£f»ence of *(SSNOR)+f(WAITC). Consequently. Che 
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full buckets arrive L2 ns before fte empty ones resulting 
in a majtimum clock frequency of 316 MHz. In. the design 
of the read synchionlssing buffer, however, we have msei;tBd 
a WA1T2 component in the request path leading to a delay 
difference of inax(5(XNOR)»<i(WAm)). Since f(WAm) 
is larger th$n 5(XNOR), the emply buckets azxivo before the 
Ml ones. The time diCfisrence is only 0.8 ns restiliing in a 
maximum clock frequracfy of 403 MH^e^. 

4*2. SyndsrozkisabSon baaed on dock edges 

The tstfo-place synchronising buffer based on WAlT- 
components can be transformed into a faster one by bntig- 
ing the two WATTZ-components togather (for instance in 
between the mousetrap cells). The two combined WAIT2- 
components form a so-called EDCB-component. 

Fig. 8(a) shows ttie UB4-componeiit that synchronizes 
the upgmng pha$e of a four-phase handshake whh the n$'< 
Ing edges of a clotik. The component is constructed by 
connecdng two WA3^con)ponAats. Fig. S(b) describes 
tti6 total behaviour of the UB4-component, which con- 
sists of the panillel composidon of th& beaviour of two 
WAIT4-components. The first WAJT4^omponent waits 
undl both ^e request signal 1$ high and the clock signal is 
low and then malccs intetmediate signal ar high. The second 
WATF4-CQmponent waits iindl bo^ rignal at and the dook 
are high and ttion makes dgnal Adi hi^. When the request 
sigaal goes low. signal <^ and, consequently, signal Ack go 
low as well At falUng clock edges the imrerter delay be- 
tween the clock signals of the two WATr-compoaente takea 
care of closing the second WAlT-component before the first 
one is opened. If we abstract i^rom the internal behaviour we 
get the exteraal belhaviour w)ddi ia shown in Fig. 8(c). 

Dq obtain a two^bftse vecsion of ^e BDCSB-component 
QJE2) we can nse (ho 4T02-circuit again. 

TWo-place edge-synchroni^ng buffers can be obtained 
by connecting two free running buffer cells and inserting an 
UE2»component in one of the interconnecting handshake 
signals: in the acknowledge signal for the write buffer and 
in die request signal for die read buffer. Compared to the 
phase-syncbroi^izing buffers presented in the previous sec- 
tion, we now have reduced the nnmbenr of 4T02-drcnit$ 
in a iwo-place bu^Tcc from two to one. Therefore^ edge- 
synchtonizing buffers arc faster diaa phase-synchronizuig 
ones. Ihe maximum frequency is for (he write bo^S^ 478 
MHz and for die read buffer 588 MHz. 

An edge-synchronizing buffer ofEm the followine ad- 
vantages when compared to a phase-synchronia^ag one: 

# it offers a higher maximum clodc frequency; 

• for the same clock firequency it gives a smaller proba- 
biliiy of synchronization £^lnre(8hioe the circuit over- 
head ia less); 
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Figure 8, Four-phase upnadge component 
(UE4) 



« the dinii^ of tiifi extemal events in lelafion to rfsii^ 
ctlodc edges is waU-detfined (yrluch is veiy impofcant 
wb&a we add the ciicuitcy oSimng the clocked intei^ 

• it has a bebAviour that is indep^ndeat of the ratio b&- 
^tmm'Js^ aad the low phase of the clock aignal. 

4.3.. Total bufier design 

Fig. 9(a) shows &e write section wi& its. clocked inter- 
fiace. Tb& d a rfg^ i$ based on edge synchronisation and it 
containss three mouseti^ cells: j4« S and C7. The write side 
of cell A i$ connected to circuhxy offering the clocked wrii& 
irneKsc*. environnient* Ttti fl^fLop in tbe design ha$ 
a clock en^lile signal TfM^. Wimib^oelX is empty, Wrdy 
^ high and the write handshake signals are equal. The cell 
i^oen 'miss Sbt a rising edge of Wclk with Wenb high and 
whea this havens it writes data in empty cell A by maldi]$ 
the write request signal the inverse of the Wfitc acknowledge 
signaL The on^ut signals of ceQ A must meet the setup and 
hold idjnitements in^th respect to dock ^gnal Wclk. Thid 
holds not cmly for signal Whiy but also for the write ac- 
knowledge s^pial. Hie timing of these extenidl events is 
better jmScauUfr&r an edge-syndironiidngbuffeaffhan for 



Rgure 9, Bdge"synchroriMng write section 



a phaso-syndironlzlng cme. 

Clocdced input cell A is followed by the cells B and t7, 
which fonn a two-^place edge^synohrotdzing bufter. This 
bxjffer synctuonizes input opecstione with the write 6Iock. 
Big, 9(b) shows the empty signals of the thtee cells wh^aa 
the buffer operates at fuH speed. Jfust before the risuig clock 
edge, we have the followmg i^maiion: cell A is empty, and 
boOi cell B and O &ro full containing the same data item 
(die acknowledge signal making ceP B empty is being held 
up rata the rising clocik edge occni^). After the rising edge 
cdl A becomes full^ celX B becomes empiy and cell O xe- 
mafna fbll. Subsequently, the asynchronous communication 
within the cell pairs (A and B) and (C and its mad nei^*- 
hour) restores the inilial state before the next rising ^ock 
edge. Note am these C611 pahrs can ha seen as master/slave 
paiiHp whicii wh^ operating at ML speed condnuonsly con> 
tnin one data icem. At evejy rising clock edge the master 
receives data and then diis data is passed on asynchronously 
to the slave- However, when the buffer is fiiH both the mas- 
ter and die slave contain a data iism. Note also that wheoi 
the complete buffer section is empty, the latches in all thi^ 
cells are transparent Ther^ore in that aimadon this buffer 
section has a very small latency of only thzee latch delays. 

A similar analysis holds for the read section shown in 
BiQ, 10(a) which contains the mousetrap cells X, Y and 
Z. Cell 2 starts hi the empty statoi which implies J?nd^ low 
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empiy~ 



X - 
Y - 
Z — 



-ML 



fail 



JUL 



Rflura 10. Bdge-synphronlzing read section 

and its read^zandshake signals equal, Z becomeafuU, 
Rrcfy goes high and tDa icad niequ^t signal becomes di£r(a-<* 
Gttt from tlie lead acknowledge ^n&J. AsfioonaBnonratls. 
ing odgc of Rclk occm^ fiu> read ftandshakB signal becoind 
equal wMch brings the cell back to itss initial state with 
low- Hg, 100>) shows the einqyty signals of the three cells 
whfcft the buffer operates f i^ll ppeed. Just before th$ ris- 
ing clock edge, cell X is fnJJ, con Y i$ empiy and celJ Z 
is full. After tlw rising edge cell X becomos empty, cell Y 
becomes full and cell Z becomes empty. SubsequenQy* Ute 
a^yndironous coimnvnication within the cell paws and 
its neighbour) and (y and Z) restores ^ izatial 
^tnt&befbrediene^ctidsidtg clock edge. Note that ^laienty 
of this section is one cl06k tick. 

By adding a t'^o-place synchronizing bulfer to one of tbe 
synchrODi^dng sections, one can asymptotically decrease the 
probability of syodironL^otion dilute at the cost of increaC' 
ingthc!a£raicy. 

4»4. Uniirersal syuchronizsation ceU 

Tho iDCrCQse !n p^omjance that was obtained by go- 
ing £fom cloclc'ipb^e to cloclEredge synchrcni^aiion came 
Aom the i^t Qtat for etrery pair of buifer ceJ]$ wo could le- 



dace the number of 4TOa^oironite ftom two to one, When 
compaxing the 4T02-ch!cuit with (he synobroAisdng cfrcuit 
of a mons^p cell one oadi notico a sccong similarity. It i$ 
therefore tempting to integrate Che synohronizhig Ozrouit in 
die moosflCfap cell. 

Fig. 1 1(a) shows me design of such an integrated cell, 
jSinco in our technology the delay of Ifae combinational oir- 
cuitcy conrfstmg of the XMfOR, XOR and AND-gate<wbich 
can laisely be simplified) is comparable to the delay of a 
single XNOR-gate, we have done aw^ with the overhead 
of the last 4TO2-caruit With respect to performance, the 
design has an additional advantage: by doing the dock syn*^ 
chFoni2sation on a signal that is common for the transfer of 
both full and empty buckfit^, the asymmetiy between the 
two transfer patha has become v&y smaU^ 

Pig. 11(b) described the behavionr of the cell. The cell 
-when empty- waits miil its write neighbour is full and 
then at the no^Li rising clock edge it makes the latches trans- 
parent By doing this Wack becomes equal to Wf^q and 
jRrsq becomes unequal to Rack which results in making Che 
latches opaque again. Since the cell waits until both con- 
ditions for a copy opor^iion hold (being empty and write 
n^gfaboqr being full) before it synchronizes with the clock» 
it is miivetsal m fliat it can be used for both write and read 
synchxonis^ation. 

However^ by including the synchronizing component in 
the feedbackloop to the latoh^ we have extended the timing 
roquJtements with respect to the ^tended isqchronic £6tk. 
Since it^ write neighbonr is a fiee running cell, these oXf 
tended timing xequhemenls are not autoznacicaily fhMOed. 
One can weaken the timing requirements by speedmg up the 
closmg of Ihfilaiohby means of an AND-gateafter the UE4^ 
component But e^ven In that case some additional delay in 
Signal Wack is needed to fulfill the timhig requirements. 

The maximum ftequency for a synchronizing buffer 
based on tbe universal cell is 581 MHz, which is only 
matghially faster than the maximum frequency of the read 
edge-synchronizing buffer, but 20% faster dian the write 
c^gSHiynchFonizmg bidfer. Ctmxpared to the writs edlge- 
aynchionlsdngbtiffec^ a buf^ based on univez^al cells has 
tme disadvantage; the latency of an empt^ two-place syor 
cfaronisdng bufTcr is one clock iScik. 

5. Ccmduding remarks 

We presented several designs of a fiist buffer that can 
be vscd to bridge deck domaais in GALS systems. The 
designs are based on two wdl-known ideas: p^jeline syn- 
chronization andmouselrap buffers. Wefirstcombhiedbotb 
ideas and then in two step^ improved the design by incteas- 
ing its paribtmance. Bttfore dmng the transfbrmations we 
gave an elaborate analysis on die eflbcc ^at delay asym- 
metries have CB ^ p»formanc6 of the design. ComUn- 
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(a) Design 



(Hdat :s Wdat II Whc& := Wreq \\ Rreq := -Rack) 



metric synchronizer in [1 1 ]) uses one four-phasD syn- 
chronizeir instead of two. 

• The similarity between the concrol circuitry of a 
motisetrap buffer cell aiid ih& circuitry tcanslGorniing a 
foux^phase synchronizerlnto a two-pha$fi one, allowed 
to incDiporate me aynchropizadon in fiie Tuou^etrap 
cea 

Sipce the iirst transfonnatfoo is independent of the design of 
the self-limed bufi^ cell» it con be applied In any pipeline 
synch£oni2;Qtion design. 1!1ie second trAnsfbnnatlont how^ 
ever, is lestijoted tx) pipeline &yncihionizatl<tfi in znousetiap 
buffot^. 



Design 


Write side 


Ssadfiide 


Phase 


316 


403 


Bdge 


478 


588 


UaivBiKal 


581 


581 



Table 1. Maximum clock froquendes (Mhz) 



Hable 1 i^ow$--£brthediS^ntdnsign^^n]aximnm 
oloi^ toquendes of both tiie mite and the read side of the 



Figura 11. Universal synchronization cell 



Ing both ideas resulted m a de&ign with clock phase syn- 
GiunnizatioD. In the first tran$forma<dan we ztijilaoed phase 
8ryn£shronizat[on by edge ffynchxoniasaiion and in fte $acond 
tnmsfonnation we incoipoiated the edge synohtonizatlon 
drcoit in the mousetrap coll. The resulting cdl is a uni- 
versal c&ll that can be used for both the read and the write 
BideofthebuffOR 

If w0 compare the final dasi[gn with thep^K^line synchro- 
nization pios^ted in [12] the following difTfirffioces can be 
AotJced. 

■ The hnndshato synchronisiatiQn is based on clo^ 
edges iai^tead of clock phases, which, reduce? the 
axnocmt of synchroniza^n cifcuitiy. Consequently, 
the synchronizing overhead is less, which means a 
big^ maximum dock frequency or for &o ^ame 
docli frequencry a smaller probsibiUiy of synclironizar 
tlon faihire. An addfiional advantage i$ that the timing 
of fta oxfiemal events in xelalion to rising clocic edges 
is wen-defined, whidh is veiy iznpOitant when we add 
CDCUdtry of&alBg the clocked hstex&oe^ 

* llie deilg^of the two-phase syndminx^(caU^ 
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ID612437: Claims 

mput data signal, and an opaque state wherem tHe latched input data signalSS ite^S 
• a coutroller for the fiwt latch comprising signai recaing its value, 

* fJ^^J^^fri^X^^ ?^ ''"^^^^ tW^'l) indicative for flie validity of the 
mput data signal (Wdat), and fer providing a latched first control signal ^rS fee S 
havit^ a traasparmt state wherein the latched firat control signal faS to^' feSt^trol 
signal, and an opaque state wherein the latched first control & rS i?i y^^^ 

• a comparator ftw^comparing the latched first control signal (Jjren) with a second «!nf«,i 

. signal (Raofe) indicative whether the latched input ^iat^s^m^t^^^^Af^ 
P^vidinealatchcontmlaignalfercontromngiestateomeS^ 

^ynci^^ro^^^nrgS^^^ 

3. ^.**aprocessing system according to Claim 1, wherein the syn^ 
synchronizes one of the control ^gnals wilfe a iranaition of therfo^S 

4. Dat^rocessingByst©axaooo«iingtoclaiml.2or3 wherein tha civnnW,Wr,«*j-, i 
synchronises tbo second conirol signal |^)^& tto^^^S ^^^t 

5. I^at^™cessingByBtenaaccordjngtoolaim4,wheteinftepipeIinBoem^^ 

and a second pgelme element and wherein the synchronization etemeat^^S^e^JS^ 

6. ^ataprocessing system according to claim 1,2 or 3 wherein tTi<»(Kmot™™,'««<j«*»^ 

^chionizes the first control signal (wSq) Jth^ clock Synchromzatioa element , 

^^^^T^'^^f^^^'^^^^S to claim 6, wherein the pipeline 
?i V^^^^^Sf ^ elem^t and wherein the synchwnization eleS^pSS Te firsTS)*^ 
«.!«ondpipeUneelementby ^qpnduonizingfliB latch^ 2^0^^^^ 

fi«*„.^!2'"'^T^ system according to claim 3, wherein the controller fiirliier comptiaea a 

S Sn^w h?^"^^ ^^^1 latched fiS^^ ^goal = 

(Rreq). the controller bemg arranged for providing the latch control signal (S\ in resooHseM fSS^ 

control si^(W«q) tlie latohed first ccnttol si^al (Rreq) SdSSS^SS 

andwhersmthesynchromzatlm 

g^m^thelatchconta>lsignal(dDprovide^ 
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9. Dataprooessing system according to claim 1, comprises a oomparison element, a four-phase 
wait component and a latch, wherein tlae latch receives a request signal (Req) and provides the 
latched request signal as an acknowledge signal (Ack), the comparison element comparing the 
request signal (Req) and the acknowledge signal (Ack) and provides an input signal to the four- 
phase wait component, the four-phase wait component sy»chroni2iDg this signal with the clock 
signal and providing a control signal to the latch. 
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